The tÜBITAK-UEKAE statistical machine translation system for IWSLT 2009
نویسندگان
چکیده
We describe our Arabic-to-English and Turkish-to-English machine translation systems that participated in the IWSLT 2009 evaluation campaign. Both systems are based on the Moses statistical machine translation toolkit, with added components to address the rich morphology of the source languages. Three different morphological approaches are investigated for Turkish. Our primary submission uses linguistic morphological analysis and statistical disambiguation to generate morpheme-based translation models, which is the approach with the better translation performance. One of the contrastive submissions utilizes unsupervised subword segmentation to generate non-linguistic subword-based translation models, while another contrastive system uses word-based models but makes use of lexical approximation to cope with out-of-vocabulary words, similar to the approach in our Arabic-to-English submission.
منابع مشابه
The tÜbİTAK-UEKAE statistical machine translation system for IWSLT 2008
In this study, the TÜBİTAK-UEKAE statistical machine translation system based on the open-source phrasebased statistical machine translation software, Moses, is presented. Additionally, phrase-table augmentation is applied to maximize source language coverage; lexical approximation is applied to replace out-of-vocabulary words with known words prior to decoding; and automatic punctuation insert...
متن کاملThe TÜbİTAK-UEKAE statistical machine translation system for IWSLT 2007
We describe the TÜBITAK-UEKAE system that participated in the Arabic-to-English and Japanese-toEnglish translation tasks of the IWSLT 2007 evaluation campaign. Our system is built on the open-source phrasebased statistical machine translation software Moses. Among available corpora and linguistic resources, only the supplied training data and an Arabic morphological analyzer are used in the sys...
متن کاملThe TÜBITAK-UEKAE statistical machine translation system for IWSLT 2010
We report on our participation in the IWSLT 2010 evaluation campaign. Similar to previous years, our submitted systems are based on the Moses statistical machine translation toolkit. This year, we also experimented with hierarchical phrasebased models. In addition, we utilized automatic minimum error-rate training instead of manually-guided tuning. We focused more on the BTEC Turkish-English ta...
متن کاملThe TÜBİTAK statistical machine translation system for IWSLT 2012
We describe the TÜBİTAK submission to the IWSLT 2012 Evaluation Campaign. Our system development focused on utilizing Bayesian alignment methods such as variational Bayes and Gibbs sampling in addition to the standard GIZA++ alignments. The submitted tracks are the ArabicEnglish and Turkish-English TED Talks translation tasks.
متن کاملLIUM's statistical machine translation systems for IWSLT 2009
This paper describes the systems developed by the LIUM laboratory for the 2009 IWSLT evaluation. We participated in the Arabic and Chinese/English BTEC tasks. We developed three different systems: a statistical phrase-based system using the Moses toolkit, an Statistical Post-Editing (SPE) system and a hierarchical phrase-based system based on Joshua. A continuous space language model was deploy...
متن کامل